Time -frequency analysis of vocal source signal for speaker recognition

نویسندگان

Nengheng Zheng

Pak-Chung Ching

Tan Lee

چکیده

This paper investigates the importance of spectrotemporal characteristics of the source excitation signal for speaker recognition. We propose an effective feature extraction technique for obtaining essential timefrequency information from the linear prediction (LP) residual signal, which are closely related to the glottal excitation of individual speaker. With pitch synchronous analysis, wavelet transform is applied to every two pitch cycles of the LP residual signal to generate a new feature vector, called Wavelet Octave Coefficients of Residues (WOCOR), which provides additional speaker discriminative power to the commonly used linear predictive Cepstral coefficients (LPCC). Experimental evaluation over a Cantonese speaker recognition corpus demonstrates the effectiveness of WOCOR for speaker recognition. Recognition tests with WOCOR and LPCC outperforms the conventional methods of using Mel Frequency Cepstral Coefficients (MFCC).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time –Frequency Representation of Vocal Source Signal for Speaker Verification

We propose an effective feature extraction technique for obtaining essential time-frequency information from the linear prediction (LP) residual signal, which are closely related to the glottal vibration of individual speaker. With pitch synchronous analysis, wavelet transform is applied to every two pitch cycles of the LP residual signal to generate a new feature vector, called Wavelet Based F...

متن کامل

Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification

This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral fea...

متن کامل

Comparative Analysis of Discrimination Power of the Vocal Source and Vocal Tract Features for Speaker Verification

The paper comparatively analyzes the speaker discrimination power of the vocal source and vocal tract related features and present a speaker verification system optimally utilizing the source and tract related speaker specific information. A pitchsynchronous wavelet transform is adopted to capture the speaker specific information from the vocal source signal, particularly the Linear Prediction ...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Down-sampling speech representation in ASR

Features for automatic speech recognition (ASR) are typically sampled at about 100 Hz (10 ms analysis step). Recent experiments indicate that the most e cient components of the modulation spectrum of speech for ASR are up to about 16 Hz [1]. Consequently, RASTA processing attenuates modulation frequencies higher than 16 Hz and should in principle allow for a subsequent down-sampling of the feat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Time -frequency analysis of vocal source signal for speaker recognition

نویسندگان

چکیده

منابع مشابه

Time –Frequency Representation of Vocal Source Signal for Speaker Verification

Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification

Comparative Analysis of Discrimination Power of the Vocal Source and Vocal Tract Features for Speaker Verification

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Down-sampling speech representation in ASR

عنوان ژورنال:

اشتراک گذاری